Search CORE

4 research outputs found

Exploring Fully Offloaded GPU Stream-Aware Message Passing

Author: Kandalla Krishna
Kaplan Larry
Namashivayam Naveen
Pagel Mark
White III James B
Publication venue
Publication date: 27/06/2023
Field of study

Modern heterogeneous supercomputing systems are comprised of CPUs, GPUs, and high-speed network interconnects. Communication libraries supporting efficient data transfers involving memory buffers from the GPU memory typically require the CPU to orchestrate the data transfer operations. A new offload-friendly communication strategy, stream-triggered (ST) communication, was explored to allow offloading the synchronization and data movement operations from the CPU to the GPU. A Message Passing Interface (MPI) one-sided active target synchronization based implementation was used as an exemplar to illustrate the proposed strategy. A latency-sensitive nearest neighbor microbenchmark was used to explore the various performance aspects of the implementation. The offloaded implementation shows significant on-node performance advantages over standard MPI active RMA (36%) and point-to-point (61%) communication. The current multi-node improvement is less (23% faster than standard active RMA but 11% slower than point-to-point), but plans are in progress to purse further improvements.Comment: 12 pages, 17 figure

arXiv.org e-Print Archive

Native Mode-Based Optimizations of Remote Memory Accesses in OpenSHMEM for Intel Xeon Phi

Author: Barbara Chapman
Deepak Eachempati
Dounia Khaldi
Naveen Namashivayam
Sayan Ghosh
Publication venue
Publication date: 11/04/2020
Field of study

ABSTRACT OpenSHMEM is a PGAS library that aims to deliver high performance while retaining portability. Communication operations are a major obstacle to scalable parallel performance and are highly dependent on the target architecture. However, to date there has been no work on how to efficiently support OpenSHMEM running natively on Intel Xeon Phi, a highly-parallel, power-efficient and widely-used many-core architecture. Given the importance of communication in parallel architectures, this paper describes a novel methodology for optimizing remote-memory accesses for execution of OpenSHMEM programs on Intel Xeon Phi processors. In native mode, we can exploit the Xeon Phi shared memory and convert OpenSHMEM one-sided communication calls into local load/store statements using the shmem_ptr routine. This approach makes it possible for the compiler to perform essential optimizations for Xeon Phi such as vectorization. To the best of our knowledge, this is the first time the impact of shmem_ptr is analyzed thoroughly on a manycore system. We show the benefits of this approach on the PGAS-Microbenchmarks we specifically developed for this research. Our results exhibit a decrease in latency for onesided communication operations by up to 60% and increase in bandwidth by up to 12x. Moreover, we study different reduction algorithms and exploit local load/store to optimize data transfers in these algorithms for Xeon Phi which permits improvement of up to 22% compared to MVAPICH and up to 60% compared to Intel MPI. Apart from microbenchmarks, experimental results on NAS IS and SP benchmarks show that performance gains of up to 20x are possible

CiteSeerX

OpenSHMEM as an Effective Communication Layer for PGAS Models

Author: Ravichandrasekaran Naveen Namashivayam 1990-
Publication venue
Publication date: 15/02/2018
Field of study

Languages and libraries based on the Partitioned Global Address Space (PGAS) programming model have emerged in recent years with a focus on addressing the programming challenges for scalable parallel systems. Among these, Coarray Fortran (CAF) is unique in that as it has been incorporated into an existing standard (Fortran 2008), and therefore it is of particular importance that implementations supporting it are both portable and deliver sufficient levels of performance. OpenSHMEM is a library which is the culmination of a standardization effort among many implementers and users of SHMEM, and it provides a means to develop light-weight, portable, scalable applications based on the PGAS programming model. As such, we propose here that OpenSHMEM is well situated to serve as a runtime substrate for other PGAS programming models. In this work, we demonstrate how OpenSHMEM can be exploited as a runtime layer upon which CAF may be implemented. Specifically, we re-targeted the CAF implementation provided in the OpenUH compiler to OpenSHMEM, and show how parallel language features provided by CAF may be directly mapped to OpenSHMEM, including allocation of remotely accessible objects, one-sided communication, and various types of synchronization. Moreover, we present and evaluate various algorithms we developed for implementing remote access of non-contiguous array sections, and acquisition and release of remote locks using the OpenSHMEM interface. Through this work, we argue for specific features like block-wise strided data transfer, multi-dimensional strided data transfer, and atomic memory operations which may be added to OpenSHMEM to better support idiomatic usage of CAF.Computer Science, Department o

University of Houston Institutional Repository (UHIR)

Radioprotectors.org: an open database of known and predicted radioprotectors

Author: Adamson
Adamson
Agrawal
Alavi
Alchinova
Ali
Aliper
Allison
Ames
Arda
Auda
Bentzen
Bhullar
Brechbiel
Buetow
Byhardt
Calabrese
Calabrese
Calabrese
Cantley
Cantor
Cantor
Case
Catravas
Chatterjee
Coleman
Csoka
Del Turco
Domina
Dulout
Edgren
Fedintsev
Fike
Fraifeld
Goto
Greenberger
Gudkov
Hattori
Herskind
Hochberg
Hopcia
Hosseinimehr
Hu
Huang
Iarmonenko
Joshi
Jung
Kang
Kim
Kirsch
Komarova
Kondo
Korosi
Kroemer
Kroemer
Lee
Levine
Lim
Loi
Maltezos
Mamoshina
Mascetti
Matthews
McClellan
Mitchell
Mitchell
Mizushima
Morgan
Musallam
Najafi
Namashivayam
Naveen Kumar
Nomura
Osipov
Ozerov
Ozols
Palcic
Pinnell
Pipalová
Ramaija
Rezáčová
Rodemann
Rothkamm
Scerbacov
Schwab
Shao
Shapiro
Shayakhmetov
Smyth
Sonntag
Spotheim-Maurizot
Srinivas
Swick
Tapio
Todorov
Vijg
Vlahos
Warshaw
Weichhart
Weinstein
Wu
Yuhas
Zhang
Zhang
Zhang
Zhao
Zhavoronkov
Zhavoronkov
Zhavoronkov
Zhavoronkov
Zhavoronkov
Zhavoronkov
Zhavoronkov
Publication venue: 'Impact Journals, LLC'
Publication date
Field of study

Crossref